Temporal Selective Max Pooling Towards Practical Face Recognition

نویسنده

Xiang Xiang

چکیده

In this report, we deal with two challenges when building a real-world face recognition system the pose variation in uncontrolled environment and the computational expense of processing a video stream. First, we argue that the frame-wise feature mean is unable to characterize the variation among frames. We propose to preserve the overall pose diversity if we want the video feature to represent the subject identity. Then identity will be the only source of variation across videos since pose varies even within a single video. Following such an untangling variation idea, we present a pose-robust face verification algorithm with each video represented as a bag of frame-wise CNN features. Second, instead of simply using all the frames, we highlight the algorithm at the key frame selection. It is achieved by pose quantization using pose distances to K-means centroids, which reduces the number of feature vectors from hundreds to K while still preserving the overall diversity. The recognition is implemented with a rank-list of oneto-one similarities (i.e., verification) using the proposed video representation. On the official 5000 video-pairs of the YouTube Face dataset, our algorithm achieves a comparable performance with state-of-the-art that averages over deep features of all frames. Particularly, the proposed generic algorithm is verified on a public dataset and yet applicable in real-world systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Emergence of Selective Invariance in Hierarchical Feed Forward Networks

Many theories have emerged which investigate how invariance is generated in hierarchical networks through simple schemes such as max and mean pooling. The restriction to max/mean pooling in theoretical and empirical studies has diverted attention away from a more general way of generating invariance to nuisance transformations. In this exploratory study, we study the conjecture that hierarchica...

متن کامل

Learning Robust Deep Face Representation

With the development of convolution neural network, more and more researchers focus their attention on the advantage of CNN for face recognition task. In this paper, we propose a deep convolution network for learning a robust face representation. The deep convolution net is constructed by 4 convolution layers, 4 max pooling layers and 2 fully connected layers, which totally contains about 4M pa...

متن کامل

Action Representation Using Classifier Decision Boundaries

Most popular deep learning based models for action recognition are designed to generate separate predictions within their short temporal windows, which are often aggregated by heuristic means to assign an action label to the full video segment. Given that not all frames from a video characterize the underlying action, pooling schemes that impose equal importance to all frames might be unfavorab...

متن کامل

Successful Decoding of Famous Faces in the Fusiform Face Area

What are the neural mechanisms of face recognition? It is believed that the network of face-selective areas, which spans the occipital, temporal, and frontal cortices, is important in face recognition. A number of previous studies indeed reported that face identity could be discriminated based on patterns of multivoxel activity in the fusiform face area and the anterior temporal lobe. However, ...

متن کامل

Second-order Temporal Pooling for Action Recognition

Most successful deep learning models for action recognition generate predictions for short video clips, which are later aggregated into a longer time-frame action descriptor by computing a statistic over these predictions. Zeroth (max) or first order (average) statistic are commonly used. In this paper, we explore the benefits of using second-order statistics. Specifically, we propose a novel e...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1609.07042 شماره

صفحات -

تاریخ انتشار 2016

Temporal Selective Max Pooling Towards Practical Face Recognition

نویسنده

چکیده

منابع مشابه

Emergence of Selective Invariance in Hierarchical Feed Forward Networks

Learning Robust Deep Face Representation

Action Representation Using Classifier Decision Boundaries

Successful Decoding of Famous Faces in the Fusiform Face Area

Second-order Temporal Pooling for Action Recognition

عنوان ژورنال:

اشتراک گذاری